Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning
نویسندگان
چکیده
Offline reinforcement learning—learning a policy from batch of data—is known to be hard for general MDPs. These results motivate the need look at specific classes MDPs where offline learning might feasible. In this work, we explore restricted class obtain guarantees learning. The key property, which call Action Impact Regularity (AIR), is that actions primarily impact part state (an endogenous component) and have limited on remaining exogenous component). AIR strong assumption, but it nonetheless holds in number real-world domains including financial markets. We discuss algorithms exploit provide theoretical analysis an algorithm based Fitted-Q Iteration. Finally, demonstrate outperforms existing across different data collection policies simulated real world environments regularity holds.
منابع مشابه
Learning State and Action Hierarchies for Reinforcement Learning Using Autonomous Subgoal Discovery and Action-Dependent State Space Partitioning
This paper presents a new method for the autonomous construction of hierarchical action and state representations in reinforcement learning, aimed at accelerating learning and extending the scope of such systems. In this approach, the agent uses information acquired while learning one task to discover subgoals for similar tasks. The agent is able to transfer knowledge to subsequent tasks and to...
متن کاملGenerating Hierarchical Structure in Reinforcement Learning from State Variables
This paper presents the CQ algorithm which decomposes and solves a Markov Decision Process (MDP) by automatically generating a hierarchy of smaller MDPs using state variables. The CQ algorithm uses a heuristic which is applicable for problems that can be modelled by a set of state variables that conform to a special ordering, defined in this paper as a “nested Markov ordering”. The benefits of ...
متن کاملReinforcement Learning in Continuous State and Action Spaces
Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to ...
متن کاملOffline Evaluation of Online Reinforcement Learning Algorithms
In many real-world reinforcement learning problems, we have access to an existing dataset and would like to use it to evaluate various learning approaches. Typically, one would prefer not to deploy a fixed policy, but rather an algorithm that learns to improve its behavior as it gains more experience. Therefore, we seek to evaluate how a proposed algorithm learns in our environment, meaning we ...
متن کاملReinforcement Learning in Continuous State and Action Space
To solve complex navigation tasks, autonomous agents such as rats or mobile robots often employ spatial representations. These “maps” can be used for localisation and navigation. We propose a model for spatial learning and navigation based on reinforcement learning. The state space is represented by a population of hippocampal place cells whereas a large number of locomotor neurons in nucleus a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Artificial Intelligence Research
سال: 2023
ISSN: ['1076-9757', '1943-5037']
DOI: https://doi.org/10.1613/jair.1.14580